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Abstract 

A limitation of many clustering algorithms is the requirement to tune adjustable pa¬ 
rameters for each application or even for each dataset. Some techniques require an a priori 
estimate of the number of clusters while density-based techniques usually require a scale 
parameter. Other parametric methods, such as mixture modeling, make assumptions about 
the underlying cluster distributions. Here we introduce a non-parametric clustering method 
that does not involve tunable parameters and only assumes that clusters are unimodal, in 
the sense that they have a single point of maximal density when projected onto any line, 
and that clusters are separated from one another by a separating hyperplane of relatively 
lower density. The technique uses a non-parametric variant of Hartigan’s dip statistic us¬ 
ing isotonic regression as the kernel operation repeated at every iteration. We compare 
the method against k-means-|—b, DBSCAN, and Gaussian mixture methods and show in 
simulations that it performs better than these standard methods in many situations. The 
algorithm is suited for low-dimensional datasets with a large number of observations, and 
was motivated by the problem of “spike sorting” in neural electrical recordings. Source code 
is freely available. 

Keywords: non-parametric, spike sorting, density-based clustering 
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1 Introduction 


Unsupervised data clustering is a methodology for automatically partitioning a set of data points 
in a manner that reflects the underlying structure of the data. In many clustering applications 
with continuous data in p dimensions, clusters are expected to have a core region of high density 
and to be separated from one another by a region of relatively lower density. The motivating 
application for the authors is spike sorting of neuron firing events recorded electrically, for which 


this property has been found to hold experimentally (Tiganj & Mboup 2011, Vargas-Irwin & 


Donoghue||2007 ). We expect this work to be useful in a variety of other clustering applications. 


particularly in cases with a large number of observations in a low dimensional feature space. 

A limitation of most clustering algorithms is the need to tune a set of adjustable parameters. 


The adjustments may be per application, or even per dataset. For k-means (Lloyd 1982), the 
adjustable parameter is K, the prospectively estimated number of clusters. For large datasets 
where dozens of clusters are present, the choice of K is especially problematic. In addition, 
the output of k-means depends heavily on the initialization step (choosing seed points), and the 
algorithm is often repeated several times to obtain a more globally optimal solution. K-means-|—I- 
( Arthur fc Vassilvitskii|[^007 ) does a better job at seeding, but some rerunning is still required. 
Even with optimal seeding, if some clusters are small or sparse relative to the dominant cluster 
then they are often merged into nearby clusters. In general, k-means tends to favor artificially 
splitting larger clusters at the expense of merging smaller ones. A further limitation is that the 
algorithm assumes isotropic cluster distributions with equal populations and equal variances. 

Gaussian mixture modeling (GMM), usually solved using expectation-maximization (EM) 


iterations (Dempster et al. 1977), is more flexible than k-means since it allows each cluster to 
be assigned its own multivariate normal distribution. Many variations exist, some of which are 
outlined in ( Murphy]|2012 Gh. 11). While some implementations require prospective knowledge 
of the number of clusters (McLachlan & Peel 2000 Ch. 8), other implementations consider 


this as a free variable (e.g. Roberts et al. 1998). The main limitation is that clusters must be 
well modeled by Gaussians; furthermore, as in k-means, it can be difficult to find the global 
solution, especially when the number of clusters is large. Recently, mixture models with skew 
non-Gaussian components have been developed (e.g. Eriihwirth-Schnatter fc Pyne|2010' Browne 


& McNicholas 2015). However, the increase in model complexity results in a more challenging 


optimization problem. 


Hierarchical clustering (Zaki & Meira Jr 2014 Gh. 14) does not require specification of the 
number of clusters ahead of time, but this is because the output is a dendrogram rather than 
a partition. Thus it cannot immediately be applied to our application of interest. There is 
a way to obtain an automated partition from the dendrogram, but this involves specifying a 
criteria for cutting the binary tree (much like specifying K). Other choices need to be made for 
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agglomerative hierarchical clustering in order to determine which clusters are merged at each 
iteration. Furthermore, hierarchical clustering has time complexity at least O(n^) where n is the 
size of the dataset ( Zaki fc Meira Jr]|2014 Sec. 14.2.3). 

Density-based clustering techniques such as DBSCAN ( Ester et al.|1996 | are promising since 
they do not make assumptions about data distributions, so they can handle clusters with ar¬ 
bitrary non-convex shapes. The drawback is that two parameters must be adjusted depending 
on the application, including e, a parameter of scale. The algorithm is especially sensitive to 
this parameter in higher dimensions. A further limitation is that if the clusters substantially 
differ in density, then no choice of e will simultaneously handle the entire dataset. Thus a 
scale-independent method using data density is desirable. 

Other density-based techniques, such as ( Cheng|1995 ), involve the initial step of constructing 
a continuous non-parametric probability density function ( Zaki fc Meira Jr]|2014 Ch. 15). The 
basic version of the kernel density method (Rosenblatt 1956 Parzen 19621 involves specifying 
a spatial scale parameter (the so-called bandwidth), and suffers from the same problem as DB¬ 


SCAN. Variations of this method can automatically estimate an optimal bandwidth (Silverman 


19861, and can even derive a value that is spatially dependent. There are many density-estimation 


methods to choose from (incluuding Rodriguez & Laio 20141, but they often depend on adjustable 
distance parameters. In general, these methods become computationally intractable in higher 
dimensions. 

Here we introduce an efficient density-based, scale-independent clustering technique suited 
for situations where clusters are expected to be unimodal and when any pair of distinct clusters 
may be separated by a hyperplane. We say a cluster is unimodal if it arises from a distribution 
that has a single point of maximum density when projected onto ony line. Thus, our assumption 
is that the projection of any two clusters onto the normal of the dividing hyperplane gives a 
bimodal distribution separating the clusters at its local minimum. Loosely speaking, this is 
guaranteed when the clusters are sufficiently spaced and have convex shapes. 

In addition to being density-based, our technique has the flavor of agglomerative hierarchical 
clustering as well as the EM-style iterative approach of k-means. The algorithm uses a non- 
parametric procedure for splitting one-dimensional distributions based on a modified Hartigan’s 
dip statistic ( ]Hartigan fc Hartigan|19^ I and isotonic regression. Neither isotonic regression nor 
Hartigan’s method involve adjustable parameters (aside from selection of a statistical significance 
threshold). In particular, no scale parameter is needed for density estimation. Furthermore, 
since the core step performed at each iteration is one-dimensional (ID) clustering applied to 
projections of data subsets onto lines, we avoid the curse of dimensionality (the tradeoff being 
that we cannot handle clusters of arbitrary shape). 

This paper is organized as follows. First we describe an algorithm for splitting a ID sample 
into unimodal clusters. This procedure forms the basis of the p-dimensional clustering technique. 
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Figure 1: (A,B,C) Histograms for three simulated ID distributions with n = 1000 datapoints. 

The first represents a single cluster whereas the second and third are sampled from bimodal 
probability distributions. (D,E,F) show the corresponding sequence of spacings (finite differ¬ 
ences) between adjacent points. The dark curves are the outputs of down-up isotonic regression 
(defined in Section]^ and Appendix]^ for approximating these series. (G,H,I) show the corre¬ 
sponding normalized spacings that are fit using up-down isotonic regression. The peaks in the 
dark curves of (H) and (I) indicate that the set of samples should be split into two clusters, 
whereas the dip in (G) is not used to split distribution since the unimodality hypothesis in this 
case is not rejected. 

ISO-SPLIT, defined in Section 3. Simulation results are presented in Section 4, comparing ISO¬ 
SPLIT with three standard clustering techniques. In addition to quantitative comparisons using 
a measure of accuracy, examples illustrate situations where each algorithm performs best. The 
fifth section is an application of the algorithm to spike sorting of neuronal data. Next we discuss 
computational efficiency and scaling properties. Finally, Section 8 summarizes the results and 
discusses the limitations of the method. The appendices cover implementation details for isotonic 
regression, generation of synthetic datasets for simulations, and provide evidence for insensitivity 
to parameter adjustments. 
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2 Clustering in one dimension 


Any approach overcoming the above limitations must at least be able to do so in the simplest, ID 
case (p = 1). Here we develop a non-parametric approach to ID clustering using a statistical test 
for unimodality and isotonic regression. The procedure will then be used as the kernel operation 
in the more general situation (p > 2) as described in Section]^ 

Clustering in ID is unique because the input data can be sorted. The problem reduces to 
selecting cut points (real numbers between adjacent data points) that split the data into K 
clusters, each corresponding to an interval on the real line. We assume that the clusters are 
unimodal so that adjacent clusters are separated by a region of relatively lower density. For 
simplicity we will describe an algorithm for deciding whether there is one cluster, or more than 
one cluster. In the latter case, a single cut point is determined representing the boundary 
separating one pair of adjacent clusters. Note that once the data have been split, the same 
algorithm may then be applied recursively on the left and right portions leading to further 
subdivisions, converging when no more splitting occurs. Thus the algorithm described here may 
be used as a basis for general ID clustering. 

Let a;i < • • • < a;„ be the sorted (assumed distinciQ real numbers (input data samples). Our 
fundamental assumption is that two adjacent clusters are always separated by a region of lower 
density. In other words, if oi and 02 are the centers of two adjacent ID clusters, then there 
exists a cut point oi < c < 02 such that the density near c is significantly less than the densities 
near both oi and 02 . The challenge is to define the notion of density near a point. The usual 
approach is to use histogram binning or kernel density methods. However, as described above, 
we want to avoid choosing a length scale e. 

Instead we use a variant of Hartigan’s dip test. The null hypothesis is that the set X = {xj} is 
an independent sampling of a unimodal probability density f{x), which by definition is increasing 
on [—oo,c] and decreasing on [c, 00 ]. The dip statistic is the Kolmogorov-Smirnov distance 

Dx = {sup \F{x) - S'x(a:^)|}, 

X 

where 

Sx{x) = #{j : Xj < x} 

is the empirical distribution and A is a unimodal cumulative distribution function that approx¬ 
imates Sx- In contrast to Hartigan’s original method, we use down-up isotonic regression to 
determine F. As outlined in Appendix down-up isotonic regression finds the best least- 
squares approximation to a function that is monotonically decreasing to a critical point c and 
then monotonically increasing. This algorithm is used to fit the sequence of spacings between 

^The data are assumed to be independent samples from a continuous probability distribution, thus distinct 
with probability one. More details are given in the discussion. 
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adjacent points, as illustrated in FigureThe spacings attain their minimum at the peak of the 
distribution. The function is then integrated to obtain the cumulative distribution function F. 
If the Dx statistic lies above a certain threshold t„, then the unimodality hypothesis is rejected. 
In this study we used a threshold of Tn = aj^fn with a = 1.2 (see Appendix [b|) . 

Hartigan’s test only produces an accept/reject result, and does not supply an optimal cut 
point separating the clusters. Let a:i < • • • < be the sorted (assumed distinct) real numbers 
(input data samples). Our goal is to obtain such a cut point c. Let Sj denote the spacing 
between adjacent points Xj and Xj+i, which is an approximation to the reciprocal of density at 
this location (we assume that the Xj are all distinct). To detect a density dip, we normalize 
these spacings by setting 



where tj is the corresponding spacing for the approximating unimodal distribution obtained 
above. As shown in we can then fit the new sequence of spacings using up-down isotonic 
regression, since we expect the normalized spacings to increase at the dip. The cut point is 
selected at the peak of this up-down fit to the spacings. 

There is a flaw with Hartigan’s test in the case where the number of points in one cluster (say 
on the far left) is very small compared with the total size n. This is because the absolute size of 
the dip depends only on the region at the interface between the two clusters, whereas the test for 
rejection becomes increasingly stringent with increasing n. To remedy this we compute a series 
of dip tests of sizes 4, 8,16, 32,..., [log 2 n\. For each size, two tests are performed, one starting 
from the left (more negative) side of the dataset, and one starting from the right (more positive). 
As soon as the unimodality hypothesis is rejected in one of these tests, the algorithm halts and 
the null hypothesis is rejected, and the cut point for that segment is returned. Otherwise it is 
accepted. 

3 Clustering in higher dimensions using one-dimensional 
projections 

In this section we address the p-dimensional situation (p > 2) and describe an iterative procedure, 
termed ISO-SPLIT, in which the ID routine is repeated as a kernel operation. The decision 
boundaries are less restrictive than fc-means which always splits space into Voronoi cells with 
respect to the centroids, as illustrated in Figure 

The proposed procedure is outlined in Algorithm[T] The input is a collection of n points in K^, 
and the output is the collection of corresponding labels (or cluster memberships). The approach 
is similar to agglomerative hierarchical methods in that we start with a large number of clusters 
(output of InitializeLahels) and iteratively reduce the number of clusters until convergence. 
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Figure 2: Illustration that the placement of decision boundaries for ISO-SPLIT is more flexible 
than for k-means clustering. The left panel shows an artificial 2D dataset where each of the 
three labels has points drawn from bivariate Gaussian pdfs. The middle panel shows the result 
of clustering with ISO-SPLIT, and the decision lines (in general hyperplanes). The right panel 
shows the same using k-means. Unlike in k-means, which assumes all clusters have the same 
variance, in ISO-SPLIT hyperplanes may be positioned closer to one of the centroids than the 
other, and need not be orthogonal to the line connecting centroids. (See color online.) 

However, in addition to merging clusters the algorithm may also redistribute data points between 
adjacent clusters. This is in contrast to agglomerative hierarchical methods. At each iteration, 
the two closest clusters (that have not yet been handled) are selected and all data points from the 
two sets are projected onto a line orthogonal to the proposed hyperplane of separation. The ID 
split test from the previous section is applied (see above) and then the points are redistributed 
based on the optimal cut point, or if no statistically significant cut point is found the clusters 
are merged. This procedure is repeated until all pairs of clusters are handled. 

The best line of projection may be chosen in various ways. The simplest approach is to use 
the line connecting the centroids of the two clusters of interest. Although this choice may be 
sufficient in most situations, the optimal hyperplane of separation may not be orthogonal to this 
line. The approach we used in our implementation is to estimate the covariance matrix of the 
data in the two clusters (assuming Gaussian distributions with equal variances) and use this 
to whiten the data prior to using the above method. The function GetProjectionDirection in 
Algorithm returns a unit vector V representing the direction of the optimal projection line, 
and the function Project simply returns the inner product of this vector with each data point. 

Similarly, there are various approaches for choosing the closest pair of clusters at each iteration 
(FindClosestPair). One way is to minimize the distance between the two cluster centroids. Note, 
however, that we don’t want to repeat the same ID kernel operation more than once. Therefore, 
the closest pair that has not yet been handled is chosen. In order to avoid excessive iterations 
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Algorithm 1 

function ISO-SPLIT({yi,..., y„},Q;) 

{Li,..., L„} ^ InitializeLabels({j/i,... ,y„}) 

UsedPairs ^ {} 

loop 

[fci, ^ 2 , exists] ^ FindClosestPair({?/i,..., j/„}, {Li,..., L„}, UsedPairs) 
if not exists then i> Converged 

break 
end if 

51 = {j ■ Lj = ki) 

5 2 = {j ■■ Lj = fca} 

V <— GetProjectionDirection(5'i, S' 2 ) 

Xi <- {Project(U,?/j) : j € Si} 

X 2 ^ {Project(U, 2 /j) : j G S' 2 } 

[c, do-Split] ^ ComputeOptimalCutpoint(Ai U A 2 ) 
if do_split then > Redistribute according to cut point 
for all j G Si U S 2 do 

if Project(L, j/j) < c then Lj ^ ki 
else Lj ^ k2 
end if 
end for 
elsei> Merge 

Lj i — ki Vj G Si U S2 
end if 

UsedPairs ^ UsedPairs U {(Si, S 2 ), (S 2 , Si)} 

end loop 

> Reassign labels to be in {1,... ,K} 

L ^ Reniap(U) 

return L 
end function 



True 


ISO-SPLIT 


K-means (K known) 



Figure 3: K-means assumes that the cluster populations and variances are all the same. In 
this 2D example (corresponding to the K = 6 case of Simulation 2 (Anisotropic) below), 
two relatively small clusters are merged by k-means while a larger cluster is split. On the other 
hand, ISO-SPLIT makes no assumptions about relative cluster sizes, thus is able to handle this 
situation. Gaussian mixture clustering with unknown K (determined using BIG) also fails in this 
case. DBSGAN fails because the clusters do not all have the same density. Note that ISO-SPLIT 
was not given information about the true number of clusters nor the expected density. 

we used a heuristic for determining whether a particular cluster pair (or something very close to 
it) had been previously attempted. 

The function InitializeLabels creates an initial labeling (or partition) of the data. This may 
be implemented using the fc-means algorithm with the number of initial clusters chosen to be 
much larger than the expected number of clusters in the dataset, the assumption being that the 
output should not be sensitive once Ainitiai is large enough (see Appendix]^. For our tests we 
used the minimum of 20 and four times the true number of clusters. Since datasets may always 
be constructed such that our choice of ATinitiai is not large enough, we will seek to improve this 
initialization step in future work. 

The critical step is ComputeOptimalCutpoint, which is the ID clustering procedure described 
in the previous section, using a threshold of = ajyjn. 

Figure [^highlights a case where ISO-SPLIT outperforms k-means, DBSGAN, and GMM (for 
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Figure 4: Clustering of synthetic 2D data with K = 2 clusters (one of which is non-convex) 
which cannot be separated by a hyperplane. Unlike ISO-SPLIT, DBSCAN can handle this 
situation well. However, as the last two panels show, the scale parameter e must be properly 
chosen. 
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Figure 5: Histogram of the ID probability distribution used to generate skewed, non-Gaussian 
clusters in Simulation 3. See Equation Q. 

unknown K). This example was selected from Simulation 2 as will be described in Section]^ 
Unlike k-means and DBSCAN, ISO-SPLIT makes no assumptions about relative cluster popu¬ 
lation sizes and peak densities. On the other hand, Figure illustrates a case where DBSCAN 
(when properly tuned) performs better than the other methods. 


4 Method comparison via simulation 


A series of experiments were performed to compare the various approaches considered in this 
paper. We use an accuracy measure 

K 


= -E' 

K ^ 


c,7r(c) ^c,7r(c 


Lf \ I vp It / \ 

c=l \ / 

where K is the true number of clusters, j is the number of points in true class % labeled by the 
algorithm as class j, Ui := j i® ^^e true size (population) of cluster i, and n' := 

the size of the found cluster j. Finally, 7r(c) is the class number in the second labeling matching 
most closely to class c in the true labeling (i.e., maximizing nc_,r(c)); note that this need not be a 
permutation of the original labels. The summand of the measure a is the smaller of precision and 


recall, and is thus a lower bound on the F-measure ( Zaki fc Meira Ji] 2014 Ch. 17) (the latter 
being the harmonic mean of precision and recall). We prefer this measure of cluster similarity 
over many others in the literature because it weights each cluster equally, allowing us to assess 
fairly the performance when there is a wide range of cluster populations. (Such a metric is also 
the relevant one for the application presented in Sec. i) Note also that the contribution of each 
cluster is the minimum of (a) the extent to which the cluster is not split, and (b) the extent to 
which the cluster is not merged with another cluster. Thus a contribution of 100% means that 
the cluster is labeled perfectly with respect to both sensitivity (recall) and specificity. Unlike the 
F-measure, a value of 50% could mean that the cluster was either split evenly or merged with a 
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Figure 6: Example run from Simulation 1 (Isotropic). All investigated approaches perform 
well when clusters are well-spaced and isotropic with equal variances and populations. 
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Figure 7: Example run from Simulation 3, with non-Gaussian (skewed) clusters. ISO-SPLIT is 
density-based and able to handle this situation, whereas k-means and GMM fail due to violation 
of their underlying assumptions. Despite being density based, DBSGAN also fails since it cannot 
handle datasets containing clusters of widely differing densities. 
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Figure 8: Example run from Simulation 4 in which clusters are tightly packed (zq = 1.7). The 
two density based methods (ISO-SPLIT and DBSCAN) erroneously merge adjacent clusters in 
this case because they are not sufficiently separated by density. 
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K-means (K known) 



Figure 9: Projection onto two dimensions of an example run from Simulation 5 with the 
number of dimensions equal to p = 6. ISO-SPLIT performs well in this case whereas all other 
approaches struggle apart from GMM when given the correct K. 





ISO-SPLIT 


1=1 

1= 

IZZI 

K-means (K known) 
GMM (K known) 

GMM (BIG) 



DBSCAN 


Figure 10: Results of simulations from Table [T] with 6 true clusters. Accuracies for ISO-SPLIT 
are at least comparable to the other approaches in all cases except for when clusters are tightly 
packed (Simulation 4). ISO-SPLIT yields the highest accuracy in the non-Gaussian clusters case 
(Simulation 3). ISO-SPLIT and GMM (BIG) were the only algorithms for which no adjustable 
parameters were specified. 
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Simulation 1 (Isotropic) 

Gaussian, Isotropic, Equal pops. 

3 Clusters 

6 Clusters 

12 Clusters 

ISO-SPLIT 

98.7 ±0.1% 

98.2 ±0.1% 

96.6 ± 0.7% 

K-means (K known) 

99.0 ±0.1% 

98.7 ±0.1% 

98.4 ± 0.0% 

Gaussian Mixture (K known) 

98.9 ±0.1% 

98.6 ±0.1% 

97.9 ± 0.4% 

Gaussian Mixture (BIG) 

98.9 ±0.1% 

98.6 ±0.1% 

98.3 ±0.1% 

DBSGAN (e = 0.27) 

94.8 ± 0.4% 

95.0 ± 0.2% 

94.3 ± 0.2% 

Simulation 2 (Anisotropic) 




Gaussian, Anisotropic, Unequal pops. 




ISO-SPLIT 

94.7 ±2.2% 

93.6 ± 1.9% 

94.1 ± 1.3% 

K-means (K known) 

85.3 ± 3.7% 

85.7 ±2.3% 

80.5 ± 1.6% 

Gaussian Mixture (K known) 

98.4 ± 0.2% 

95.7 ± 1.6% 

91.2 ± 1.5% 

Gaussian Mixture (BIG) 

92.8 ± 2.0% 

95.4 ± 1.0% 

92.5 ± 1.1% 

DBSGAN (e = 0.3) 

66.7 ±3.3% 

74.0 ± 3.2% 

66.8 ± 2.9% 

Simulation 3 (Skewed) 




Non-Gaussian, Anisotropic, Unequal pops. 




ISO-SPLIT 

92.2 ± 2.3% 

94.4 ± 0.9% 

86.5 ± 3.8% 

K-means (K known) 

85.8 ± 3.2% 

81.9 ±2.6% 

79.5 ± 1.5% 

Gaussian Mixture (K known) 

78.0 ± 2.0% 

81.5 ± 1.5% 

79.4 ± 1.3% 

Gaussian Mixture (BIG) 

79.6 ± 1.3% 

83.6 ± 1.2% 

82.2 ± 1.2% 

DBSGAN (e = 0.3) 

78.5 ±4.1% 

72.9 ± 3.4% 

74.1 ± 2.7% 

Simulation 4 (Packed) 




Gaussian, Isotropic, Equal pops.. Tightly packed 




ISO-SPLIT 

79.1 ±4.8% 

55.3 ± 5.5% 

29.4 ± 3.9% 

K-means (K known) 

93.4 ± 0.2% 

91.5 ±0.2% 

89.9 ± 0.2% 

Gaussian Mixture (K known) 

93.0 ± 0.2% 

91.0 ±0.3% 

88.0 ± 0.5% 

Gaussian Mixture (BIG) 

92.9 ± 0.2% 

91.0 ±0.3% 

86.4 ± 1.1% 

DBSGAN (e = 0.3) 

33.3 ± 0.0% 

16.7 ±0.0% 

8.3 ±0.0% 

Simulation 5 (High-dimensional) 




Gaussian, Isotropic, Equal pops., # Dims = 6 




ISO-SPLIT 

88.0 ±3.1% 

96.3 ± 0.2% 

82.1 ±3.3% 

K-means (K known) 

76.4 ± 4.0% 

81.4 ±2.5% 

74.8 ± 2.4% 

Gaussian Mixture (K known) 

99.1 ±0.1% 

98.7 ±0.1% 

87.1 ± 2.3% 

Gaussian Mixture (BIG) 

72.5 ± 1.5% 

81.1 ±2.1% 

85.9 ± 1.5% 

DBSGAN (e = 1.5) 

56.7 ±5.3% 

54.2 ± 4.2% 

40.1 ±4.5% 


Table 1: Average accuracies over 20 trial simulations with standard errors. All data, except for 
in Simulation 5, were generated in two dimensions {p = 2). Simulation 1; Sample sizes were 500 
per cluster, = 0, C = 0, and cluster separations of zq = 2.5 (see text for details). Simulation 
2 : Sample sizes were randomly chosen between 100 and 1000 with ^ = 1.2, C = 2, and cluster 
separations of Zq = 2.5. Simulation 3: Same as Simulation 2 except with non-Gaussian/skewed 
clusters (see text for details). Simulation 4: Same as Simulation 1 except with clusters chosen 
closer together using zq = 1.7. Simulation 5: Same as the Simulation 2 except with a higher 
number of dimensions p = 6. 
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cluster of the same size. 

Simulations in two dimensions and higher were performed by generating random samplings 
from a mixed multivariate Gaussian distribution (except for the Skewed simulation) with clusters 
corresponding to the individual Gaussian sub-populations. The centers, spreads, orientations, 
anisotropies, and populations of the Gaussians were varied randomly. Specifically, the random 
covariance matrix for each cluster was defined as 


/ gJ-oC-t-riC 


S = i? 




gToC+r^i j 


where tq, ri,..., r„ are random numbers uniformly selected from [—1,1], C is the spread variation 
factor, ^ is the anisotropy variation factor and ii is a random rotation matrix. The cluster 
locations were chosen such that the clusters were packed tightly with the constraint that the 
solid ellipses corresponding to Mahalanobis distance z < did not intersect (see Appendix [C| for 
details). In Simulation 3 the Gaussian distributions were replaced by non-Gaussian distributions 
that were skewed in both dimensions. Specifically, the data points (prior to adjustment for the 
covariance matrix) were generated as R[F{zi), F{Z 2 )] for Zj normally distributed with 


F(z) = log(|z + 3|), 


( 1 ) 


and R a random rotation matrix (fixed for each cluster). The histogram for the ID non-Gaussian 
distribution is shown in Figure and examples of two-dimensional clusters are shown in Figure 

[71 

All experiments were performed on a Linux laptop with a 2.8GHz quad-core i7 processor 
and 8GB RAM. ISO-SPLIT was implemented in MATLAB with kernel routines in C-|—I- (a 
URL for our software is given in Sec. [^. The other clustering methods were implemented as 
follows. We used a MATLAB implementatior0 of k-means-|--|- with 100 trials/restarts. GMM 
was performed using Vedaldi fc Fulkersonj ( |2008| ) in MATLAB using EM iterations with 20 
restarts. Both k-means and GMM had the advantage of being provided with the true number of 
clusters as input. A second run of GMM was used to automatically select the number of clusters 


using the Bayesian information criterion (BIG) (Schwarz 1978). We used a multi-core C-|—I- 


implementation of DBSGAN ( Patwary et al.|2012 ), with e tuned by hand for each simulation to 
yield the highest average accuracy measure. 

All methods performed well in the Isotropic simulation under the conditions of equal pop¬ 
ulations, equal variances, and isotropic clusters, the ideal scenario for k-means (Figure [^. For 
the second simulation, ISO-SPLIT and GMM performed best. As expected, k-means did not do 


^Specifically we used the code from ( |Sorber|Retrieved June 2015[ l corrected to use the original “D^ weighting” 
recommended in (Arthur &: Vassilvitskii|2007 1 . 
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as well since the clusters were not isotropic and had unequal populations. The unequal cluster 
densities caused problems for DBSCAN. See Figure 

The number of dimensions was increased to 6 in the fifth simulation. In this case ISO-SPLIT 
performed at least comparably with the other methods. DBSCAN particularly struggled in this 
higher dimensional case. When K was not known, GMM also struggled as illustrated in Figure 

H 

In general, we see from the overall summary in Figurej^that ISO-SPLIT did as well or better 
than all other methods except in Simulation 4 for which clusters were tightly packed, presumably 
because the tighter clusters were no longer separated by regions of significantly lower density 
(see Figure]^. 


5 Application to spike sorting data 


The algorithms considered in this paper were applied to spike sorting of neural electrophysio- 
logical signals, that is, the clustering of spiking events in a time series into K clusters that can 
often be associated with individual neurons. Clustering is a key step in spike sorting. More 
specifically, we took 7 channels of data from a set of adjacent electrodes in a multielectrode 


array which records voltages from an ex vivo monkey retina (Litke et al. 20041; the 2 minutes 


of 20 kHz sampled time series data was supplied to us by the Chichilnisky Lab at Stanford. A 
set of points in that needed to be clustered was created as follows. Firstly the time series 
was high-pass filtered at 300 Hz (which insures that the signal mean is zero), then windows of 
time series of length 60 samples were extracted in which the minimum voltage dropped below 
— I20/iV. (We also applied an automatic procedure to remove windows that did not appear to be 
single spike events.) This gave n = 7275 windows. The data in each window was upsampled by 
a factor of 3 using Hann-windowed sync interpolation, then negative-peak aligned to the central 
time point, then stacked to give a length-1029 column vector. Dimension reduction to p = 10 
dimensions was finally done using PCA; that is, the 1029-by-7275 matrix A was replaced by the 
first 10 columns of V'^A, where V is the matrix whose columns are the eigenvectors of AA^ 
ordered by descending eigenvalue. This dimension reduction accounted for a fraction 68% of the 
total variance in the original data. Columns of the resulting A gave the 7275 points to cluster. 


The resulting clustering (neuron labelings) are shown in Figure 11 with the centroid wave¬ 
forms for each cluster displayed in Figure The adjustable parameters for k-means, GMM, 
and DBSCAN were carefully chosen to yield the same number of clusters {K = 8) as ISO-SPLIT. 
Each of the algorithms produced a splitting into into qualitatively distinct centroid waveforms, 
suggesting that at least 8 detectable neural units are present. Various validation methods, and 
human judgements, are now commonly applied to determine whether each cluster is a single 
neural unit (e.g. Hill et aL]|2011 Barnett et al.|[2M6 l; we do not attempt to perform that here. 
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ISO-SPLIT 


K-means {K = 8) 


Gaussian Mixture (K = 8) 



Figure 11: Spike sorting clustering application on real data with n = 7275 points in 10 di¬ 
mensions. The first three dimensions of the 10-dimensional PCA feature space are shown, with 
coloring indicating cluster (neuronal unit) membership produced by each clustering algorithm 
indicated. The classifications were highly consistent between the various methods when k-means, 
GMM, and DBSCAN were tuned to yield the same number of clusters as ISO-SPLIT. The red 
dotted lines indicate axes intersecting at the origin. 


We emphasize that ISO-SPLIT required no adjustment of free parameters, in contrast to all 
but the (poorly-performing) GMM with BIG. Indeed, when the BIG was used to select K in 
GMM, many more clusters were identified, most of them duplicates, or artificially split clusters 


(see Figure 12). An additional run of DBSGAN with e reduced from 70 to 60 identified an 
additional cluster, demonstrating the sensitivity of the output to adjustments in this parameter. 
Further analysis is required to determine which results are closer to the truth. 


6 Computational efficiency 

Each iteration of ISO-SPLIT comprises two steps. The first step, projection onto ID space, 
has computation time 0{nQp) where p is the number of dimensions and Uq < n is the number 
of points involved in the two clusters of interest. The second step is ID clustering using the 
Hartigan test and isotonic regression and has time complexity O(no). Due to the complexity 
of the cluster redistributions at each step, it is difficult to theoretically estimate the number 
iterations required for convergence. 

Table shows empirical average computation times for the simulations of Table Overall 
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3 Clusters 

6 Clusters 

12 Clusters 

Simulation 1 (Isotropic) 

Gaussian, Isotropic, Equal pops. 




ISO-SPLIT 

0.06 

0.12 

0.41 

K-means (K known) 

0.13 

0.57 

2.50 

Gaussian Mixture (K known) 

0.03 

0.16 

1.28 

Gaussian Mixture (BIG) 

0.15 

0.92 

9.78 

DBSGAN (e = 0.27) 

0.04 

0.04 

0.05 

Simulation 2 (Anisotropic) 

Gaussian, Anisotropic, Unequal pops. 




ISO-SPLIT 

0.03 

0.09 

0.33 

K-means (K known) 

0.18 

0.61 

2.35 

Gaussian Mixture (K known) 

0.04 

0.14 

0.79 

Gaussian Mixture (BIG) 

0.18 

0.88 

6.63 

DBSGAN (e = 0.3) 

0.04 

0.04 

0.06 

Simulation 3 (Skewed) 

Non-Gaussian, Anisotropic, Unequal pops. 




ISO-SPLIT 

0.04 

0.08 

0.58 

K-means (K known) 

0.18 

0.54 

2.36 

Gaussian Mixture (K known) 

0.05 

0.13 

0.88 

Gaussian Mixture (BIG) 

0.36 

1.37 

9.79 

DBSGAN (e = 0.3) 

0.04 

0.04 

0.06 

Simulation 4 (Packed) 

Gaussian, Isotropic, Equal pops., Tightly packed 




ISO-SPLIT 

0.04 

0.15 

0.63 

K-means (K known) 

0.17 

0.70 

3.08 

Gaussian Mixture (K known) 

0.06 

0.23 

1.28 

Gaussian Mixture (BIG) 

0.19 

1.07 

8.77 

DBSGAN (e = 0.3) 

0.04 

0.04 

0.05 

Simulation 5 (High-dimensional) 

Gaussian, Isotropic, Equal pops., # Dims = 6 




ISO-SPLIT 

0.04 

0.17 

6.53 

K-means (K known) 

0.27 

0.86 

5.09 

Gaussian Mixture (K known) 

0.02 

0.09 

0.68 

Gaussian Mixture (BIG) 

0.30 

1.21 

7.51 

DBSGAN (e = 1.5) 

0.04 

0.05 

0.09 


Table 2: Average computation times in seconds per trial run for the simulations of Table [T] 
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K-means (K = 8) 


Gaussian Mixture (K = 8) 
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Figure 12: Spike waveforms corresponding to the centroids of the clusters from Figure [TT| Rows 
represent electrode channels and columns represent distinct neurons/clusters. 


prefactors in the running times should not be given too much meaning, since they are highly 
implementation-dependent. In almost every case, GMM with unknown K took the longest since 
the algorithm needed to be run several times to find the optimal number of clusters using the 
BIG. Even when K was known, GMM and k-means were rerun many times (20 for GMM and 
100 for k-means), and therefore took longer on average than ISO-SPLIT in almost every case. 


With a highly optimized C-I--I- implementation (Patwary et al. 2012), DBSGAN was the most 


efficient algorithm of those considered. There was a significant increase in ISO-SPLIT’s run time 
from 6 to 12 clusters (especially in Simulation 5), but it should be noted that both K and n 
were increased by a factor of 2, since the number of points per cluster was fixed. 

Future work will investigate the theoretical bounds on the number of iterations required 
for convergence. Here we present empirical estimates for the scaling properties with increasing 
values of n (the number of samples), p (the number of dimensions), and iLtrue (the true number of 
clusters). The most time-consuming step in each iteration (isotonic regression) is independent of 
both p and ititrue but the unknown quantity is the number of iterations required for convergence. 
Figure [T^ shows the results of three simulations suggesting that computation time scales linearly 
with all three simulation parameters. 

Using code profiling in MATLAB, we found that the majority of time (approx. 90%) was 
spent performing ID clustering (Section]^ as compared with finding the best pair of clusters to 
compare, computing centroids, and projecting data onto lines. Of course this figure will depend 
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Figure 13: Empirical dependence of average computation time on three simulation parameters, 
averaged over 20 repeats; (A) number of points per sample, with number of true clusters fixed 
at 6 and p fixed at 2; (B) number of true clusters, with n fixed at 1000 and p fixed at 2; (C) 
number of dimensions with n fixed at 1000 and number of true clusters fixed at 6, with noise 
dimensions added past the first two dimensions. 

on p, AT, n, and the structure of the data. 

7 Discussion 

We have shown that, for the target application of spike sorting, our new technique produces 
results that are consistent with those of standard clustering techniques (k-means, GMM, DB- 
SCAN). Yet the key advantage of ISO-SPLIT is that it does not require selection of scale param¬ 
eters nor the number of clusters. This is very important in situations where manual processing 
steps are to be avoided. Automation is also critical when hundreds of clustering runs must be 
executed within a single analysis, e.g., applications of spike sorting with large electrode arrays. 
Furthermore, the accuracy of ISO-SPLIT appears to exceed that of standard techniques in the 
context of many simulations performed in this study. Most notably, it excels when clusters are 
non-Gaussian with varying populations, orientations, spreads, and anisotropies. 

While ISO-SPLIT outperforms standard methods in situations satisfying the assumptions of 
the method, the algorithm has general limitations and is not suited for all contexts. Because 
ISO-SPLIT depends on statistically significant density dips between clusters, erroneous merging 
occurs when clusters are positioned close to one another (see the Packed simulation). Gertainly 
this is a challenging scenario for all algorithms, but k-means or mixture models are better suited 
to handle these cases. On the other hand, if the underlying density has dips which separate 
clusters, ISO-SPLIT will find them for sufficiently large n. 

Since the one-dimensional tests are repeated at every iteration as the clusters are merged 
and redistributed, it is not possible to interpret the significance threshold of the one-dimensional 
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tests in the context of the multi-dimensional clustering. Further exploration is therefore needed 
to understand the expected false splitting and merging rates. 

Our theory depends on the assumption that the data arise from a continuous probability 
distribution. While no particular noise model is assumed, we do assume that, after projection 
onto any ID space, the distribution is locally well approximated by a uniform distribution. This 
condition is satisfied for any smooth probability distribution. In particular, it guarantees that no 
two samples have exactly the same value (which could lead to an infinite estimate of pointwise 
density). Situations where values are drawn from a discrete grid (e.g., an integer lattice) will fail 
to have this crucial property. One remedy for such scenarios could be to add random offsets to 
the datapoints to form a continuous distribution. 

Clusters with non-convex shapes may be well separated in density but not separated by a 
hyperplane (Figure]^. In these situations, alternative methods such as DBSCAN are preferable. 

But even when clusters are convex, a pair may be oriented such that the separating hyperplane 
is not orthogonal to the line connecting the centroids. 

While each iteration is efficient (essentially linear in a subset of the number of points of 
interest), computation time may be a concern since the number of iterations required to converge 
is unknown. Empirically, total computation time appears to increase linearly with the number 
of clusters, the number of dimensions, and the sample size. 

As mentioned above, a principal advantage of ISO-SPLIT is that it does not require parameter 
adjustments. Indeed, the core computational step is isotonic regression, which does not rely on 
any tunable parameters. Two parameters are fixed once and for all, the threshold of rejecting the 
unimodality hypothesis for the ID tests, and ATinitiai, the initial number of clusters. In Appendix 
[^we argue that the algorithm is not sensitive to these values over reasonable ranges. 

8 Conclusion 

A multi-dimensional clustering technique, ISO-SPLIT, based on density clustering of one-dimensional 
projections was introduced. The algorithm was motivated by the electrophysiological spike 
sorting application. Unlike many existing techniques, the new algorithm does not depend on 
adjustable parameters such as scale or a priori knowledge of the number of clusters. Using sim¬ 
ulations, ISO-SPLIT was compared with k-means, Gaussian mixture, and DBSCAN, and was 
shown to outperform these methods in situations where clusters were separated by regions of 
relatively lower density and where each pair of clusters could be largely split by a hyperplane. 
ISO-SPLIT was especially effective for non-Gaussian cluster distributions. Future research will 
focus on applying the algorithm to additional real-world problems as well as improving compu¬ 
tational efficiency. 

A MATLAB/C-I--I- implementation of ISO-SPLIT is freely available at the following URL: 
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http://github.com/magland/isosplit 
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A Up-down isotonic regression 


In this section we outline a computationally efficient variant of isotonic regression that provides 
the critical step in the kernel operation of ISO-SPLIT. Isotonic regression is a non-parametric 
method for fitting an ordered set of real numbers by a monotonically increasing (or decreas¬ 
ing) function. Suppose we want to find the best least-squares approximation of the sequence 
xi,... ,Xn by a monotonically increasing sequence. Considering the more general problem that 
includes weights, we want to minimize the objective function 

n 

P{y) = Xif, ( 2 ) 

i=l 


subject to 


yi < y 2 < ■ ■ ■ < Vn- 


This may be solved in linear time using the pool adjacent violators algorithm (PAVA) (Robertson 


et al.|[T988 l; we do not include the full pseudocode for this standard algorithm but note that it 


is essentially the same as PAVA-MSE in Algorithm 

As discussed above, ISO-SPLIT depends on a variant of isotonic regression, which we call 
updown isotonic regression. In this case we need to find a turning point j/h such that j/i < y 2 < 
■ ■ • ^ Vb and ys > ys+i • ■ ■ > y-a- Again we want to minimize F{y) of Equation ([^. One way 
to solve this is to use an exhaustive search for b G {1,... ,n}. However, this would have O(n^) 
time complexity. 

A modified PAVA that finds the optimal b for the updown case in linear time is presented in 
Algorithm The idea is to perform isotonic regression from left to right and then right to left 
using a modified algorithm where the mean-squared error is recorded at each step. The turning 
point is then chosen to minimize the sum of the two errors. 

Downup isotonic regression is also needed by the algorithm. This procedure is a straightfor¬ 
ward modification of updown in Algorithm by negating both the input and output. 
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Algorithm 2 

function UpDownIsotonic(x) 

> Isotonic regression for increasing followed by decreasing 
n ^ length(a;) 

b ^ FindOptimalB(a;) 

y^^'> ^ IsotonicIncreasing([a;i,..., xt]) 

^ IsotonicDecreasing([a;6,..., x„]) 

y ^ \yW M) ( 2 ) ( 2 ) , 

return y 
end function 

function FindOptimalB(x) 

> Find where to switch direction 

^ X 

<-Reverse(a;) 

^(1) ^ PAVA-MSE(a;i) 

-ir- Reverse(PAVA-MSE(a;*^^^)) 

Find b to minimize 

return b 
end function 

function REVERSE([a;i,..., Xn]) 

> Reverse the ordering 

return ... ,a:i] 

end function 

function PAVA-MSE([a;i,..., , w„]) 

t> Modified PAVA to return MSE at every index 

f •<— 1, j •<— 1 

count [i] ^ 1, wcount[f] ^ Wj 
sum[z] ^ WjXj, sumsqr[i] ^ Wjx'j 
yj i — 0 


for j = 2 .. .n do 

i ^ i + 1 

count [i] ^ 1, wcount[i] t— Wj 
sum[f] ^ WjXj, sumsqr[i] ^ Wjxj 

^ Mj-i 

loop 

if I = 1 then break 
end if 

if sum[i — l]/count[i — 1] < sum[z]/count[f] then break 
else[> Merge the blocks 

/ibefore ^ sumsqr[z — 1] — sum[z — l]^/count[z — 1] 

/ibefore ^ ^before + SUmSqr[z] - SUm[z]2/cOUnt [z] 

count [z — 1] ^ count [z — 1] + count [z], wcount[z — 1] wcount[z — 1] + wcount[z] 
sum[z — 1] ^ sum[z — 1] + sum[z], sumsqr[z — 1] ^ sumsqr[z — 1] + sumsqr[z] 
/iafter ^ sumsqr[z — 1] — sum[z — 1]^/count [z — 1] 

yj i yj //after //before 
Z ^ Z — 1 

end if 
end loop 
end for 
return y 
end function 
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a 

Accuracy 

Time ( s ) 

0.90 

63 % 

0.15 

1.00 

67 % 

0.14 

1.10 

86 % 

0.17 

1.20 

89 % 

0.18 

1.30 

91 % 

0.14 

1.40 

89 % 

0.12 

1.50 

89 % 

0.12 

1.60 

91 % 

0.13 

1.70 

87 % 

0.13 

1.80 

88 % 

0.12 

1.90 

88 % 

0.13 

2.00 

90 % 

0.14 

2.10 

85 % 

0.11 

2.20 

70 % 

0.12 

2.30 

64 % 

0.10 


Table 3: Results of Simulation 2 (ISO-SPLIT, 6 clusters) repeated with varying threshold pa¬ 
rameter a demonstrating insensitivity to the choice of this parameter (when within a reasonable 
range). 

B Sensitivity to parameters 

In this work we have claimed that ISO-SPLIT does not require parameter adjustments which 
depend on the application or nature of the dataset, thus facilitating fully automated clustering. 
However, practically speaking, there are two values mentioned in this paper that need to be set. 
In this section we demonstrate that the algorithm is not sensitive to these choices provided that 
they fall within a reasonable range. 

First, the threshold Tn = aj^Jn for rejecting the unimodality hypothesis needs to be chosen. 
To demonstrate, we ran Simulation 2 again with varying a. The results are found in Table 
For a between 1.2 and 2.0, the accuracies remained virtually constant. 

The second value to be set is RTmitiai, the number of clusters used to initialize the algorithm. 
Our hypothesis is that the choice of this parameter will have no affect on the output assuming 
that it is chosen large enough. This is supported in Table As discussed above, larger values of 
A'initiai will lead to significantly longer run times (quadratic time dependence), so it is important 
not to set this value to be much higher than necessary. 
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Initial K 

Accuracy 

Time (s) 

3 

44% 

0.03 

6 

87% 

0.08 

12 

92% 

0.09 

24 

90% 

0.13 

48 

89% 

0.19 

96 

88% 

0.32 


Table 4: Results of Simulation 2 (ISO-SPLIT, 6 clusters) repeated with varying RTinitiai demon¬ 
strating insensitivity to the choice of this parameter (above a certain value). As expected, 
computation time increased for larger A'initiai- 


C Packing Gaussian clusters for simulations 


Simulations 1-5 required automatic generation of synthetic datasets with fixed numbers of clus¬ 
ters of varying densities, populations, spreads, anisotropies, and orientations. The most challeng¬ 
ing programming task was to determine the random locations of the cluster centers. If clusters 
were spaced out too much then the clustering would be trivial. On the other hand, overlapping 
clusters cannot be expected to be successfully separated. Here we briefly describe a procedure 
for choosing the locations such that clusters are tightly packed with the constraint that the solid 
ellipsoids corresponding to Mahalanobis distance zg do not intersect. Thus zg is a constant con¬ 
trolling the tightness of packing, fixed for each simulation. In above simulations we considered 
zg = 2.5 and 1.7. 


The clusters are positioned iteratively, one at a time. Each cluster is positioned at the origin 
and then moved out radially in small increments of a random direction until the non-intersection 
criteria is satisfied. Thus we only need to determine whether two clusters defined by (^i, Ei) and 
(fj- 2 , E 2 ) are spaced far enough apart. Here fjLj are the cluster centers and Ej are the covariance 
matrices. The problem boils down to determining whether two arbitrary p-dimensional ellipsoids 
intersect. Surprisingly this is a nontrivial task, especially in higher dimensions, but an efficient 


iterative solution was discovered by Lin & Han (2002). For the present study, the Lin-Han 
algorithm was implemented in MATLAB. 
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